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Abstract 

In applied sciences, we often deal with deterministic simulation models that are too 
slow for simulation-intensive tasks such as calibration or real-time control. In this pa- 
per, an emulator for a generic dynamic model, given by a system of ordinary non-linear 
differential equations, is developed. The non-linear differential equations are linearized 
and Gaussian white noise is added to account for the non-linearities. The resulting linear 
stochastic system is conditioned on a set of solutions of the non-linear equations that 
have been calculated prior to the emulation. A path-integral approach is used to derive 
the Gaussian distribution of the emulated solution. The solution reveals that most of the 
computational burden can be shifted to the conditioning phase of the emulator and the 
complexity of the actual emulation step only scales like 0{Nn) in multiplications of ma- 
trices of the dimension of the state space. Here, N is the number of time-points at which 
the solution is to be emulated and n the number of solutions the emulator is conditioned 
on. 

The applicability of the algorithm is demonstrated with the hydrological model logSPM. 
Keywords: dynamic emulator, path- integral 

1 Introduction 

In applied sciences, we often have deterministic simulation models at hand that are, although 
quite accurate, too slow for many simulation-intensive tasks such as calibration or real-time 
control. The purpose of an emulator (e.g. [5]) is a fast interpolation of the response surface 
of the model. Therefore, the slow deterministic simulation model is simplified and noise is 
added to account for the errors due to simplification. The resulting fast stochastic model is 
then conditioned with outputs from the simulation model that have been produced off-line, 
that is, prior to the actual emulation. 

In this paper, we focus on dynamic models, i.e., models described by ODE's whose outputs 
are given by time-series. Treating time as an additional output component or as an additional 
input [1] and applying a standard Gaussian emulator leads to an emulation time that grows 
quadratically with the number of time points, which is inefficient if the number of time points 
is large. Emulators for the time-stepping [2], [3] have the disadvantage that the whole state- 
space must be emulated if one wants to retain the Markov property of the process. Simplified 
models in the form of stochastic linear models [8] or linear combinations of (wavelet) basis 
functions pQ have been used as well. But none of the mentioned approaches uses knowledge 
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about the dynamics of the simulation model, which we might retrieve, e.g., by linearizing 
its equations. Recently, Reichert et al. [9] developed a dynamic emulator whose underlying 
simplified model is a linear stochastic process that captures the first order dynamics of the 
model. That is, their emulator is to some extent mechanism-based and not merely statistical. 
Furthermore, they applied a Kalman filter in order for the complexity to grow only linearly 
with the number of time points. 

The intention behind this paper is to further improve the computational efficiency of the 
emulator presented in [9]. I start with a time-continuous linear stochastic process whose drift 
is assumed to be given by the linearization of the simulation model and whose noise is assumed 
to account for the non-linearities. This is the same as in |9], except that in [9] the noise is not 
integrated between time steps. For piece- wise constant input, I derive an analytic solution for 
the Gaussian distribution describing the emulated output. For this purpose, a path- integral 
approach seems to be adequate. The analytic solution reveals that most of the computational 
burden can be shifted to the conditioning phase of the emulator so that we are left with a 
computational complexity for the emulation step that grows like O(Nn) in multiplications 
of matrices of the dimension of the state space, m. Here, N is the number of time points 
and n the number of simulation outputs on which the stochastic model is conditioned. This 
is quite a substantial improvement of the algorithm presented in [S], whose emulation step 
needs 0{N) multiplications of matrices of dimension nm as well as inversions of matrices of 
dimension m. 

Just like in [9], the algorithm is then tested with the hydrological model logSPM. 

2 A Generic Dynamic Emulator 

Consider a state space V of dimension m, whose elements shall be denoted by £, and a 
deterministic simulation model, given by a system of ordinary differential equations 



where x E W denotes inputs and/or parameters and can be time- varying. Subsequently, I 
refer to x as input and usually omit its time argument. 

The idea behind the emulator is, firstly, to linearize eq. ([I]) and pack all the non-linearities 
into a noise term that is modeled with a standard Wiener process r](t) (i.e. Gaussian white 
noise). The covariance of the noise, C, is assumed to be independent of the input. Thus, the 
linear stochastic approximation to ([I]) is given by the system of linear stochastic differential 
equations 



Secondly, n + 1 replica of the system are coupled. Therefore, replace V by V 0M n+ and W by 
W = W <S> M n+1 . Henceforth, vectors without indices will denote elements of these extended 
spaces. The n + 1 replica are associated with n + 1 different inputs, x". The first n inputs 
are those for which solutions of ([I]) are calculated that will be used for the conditioning while 
the (n + l)th input is the one for which a solution is to be emulated. The replica are assumed 
to couple through the noise term only. Thus, A(x) and b(x) now denote the tensors 



= /(£(*), x(i)) 






A 



(x) = A(x>; 



(3) 
(4) 
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The closer the inputs (measured with some metric p on W) the stronger the associated replica 
are assumed to couple. Hence, set 



C(x) = C ® R(x) , 



with 



R al3 (x) = exp(-p(x a ,x /3 )/2) . 
Thus, the emulator is described by the nm coupled linear stochastic differential equations 

e(t) = A(x)e(*)+b(x)+c(x)i ? (t). (5) 

Next, I derive the probability density of on the space of paths [to, ijv] — > V ® M n+1 , 
with initial condition 

e(t ) = o. (6) 

It reads 



P[£(t)] oc exp 



1 f ijv / \ t 

(£(t) - A(x)£(i) - b(x) ) (CC T )- 1 (x) (£(*) - A(x)£(i) - b(x) ) eft 



to 



cxp 



9 / (£(*) - D^bCx))* (D^CC^D)^) (£(t) - ^- x b(x)) A 
z ./to 



where 



^ = |-A(x). 



(7) 
(8) 



To proceed, I need to determine the Green's function of D. The most general solution of 

D{t)G(t,t') = 5{t-t') (9) 
is given by (see, e.g., [6] Chapter 3.3) 



G(t, if) = (0(i - + c(0)P exp 



A(x(r))dr 



(10) 



where 0(i — t') is the regularized Heavyside function with 8(0) = 1/2 and V denotes path- 
ordering of the exponential. The function c(t') is determined by the boundary condition ([6]), 
which entails 

G(t ,t) =0, 

and translates into 

c(0 = . 

Then, the adjoint Green's function reads as 

G\t,t') = G(t' -t)Vexp -[ A t (x(t))oIt . (11) 



Now, I calculate the correlation functions for two replica at two different time points, as 
expressed by the m x m matrices 



Sjf = <H*i)®^&)> = ^/ exp -^ t W^ t (C'C' T )- 1 J D(xX(t) 



rte)®^)^, (12) 
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with 



Z 



exp 



Using (10) and (11) one finds that, for ti > tj, 

(CC T ) a P{x(t'))Vexp 



tiff 



V exp 



t 



A(x a (r))dT 



A T (-x.P(T))dT 



For ti < tj, one may use the symmetry relations 



0a\T 



dt' . (13) 



(14) 



All finite dimensional marginals of ([7]) will be Gaussians. I consider the finite-dimensional 
subspace of those components of the first n replica that are simulated with ([!]) and those 
components of the (n + l)th replica that shall be emulated, both at time points to < t\ < 
■ ■ ■ < tjsf. Therefore, I introduce the operator H(x.) that is defined as 

(H( X )[z(tm=H(x a (t i ))e(t i ), (is) 

where, on the r.h.s., H(x. a (ti)) =: Hf denotes matrices of constant rank m! < m. The image 
of (15) is supposed to be determined by eq. 

(ff(x)[£(t)])? = y?. (16) 



Integrating out all degrees of freedom that are not determined by ( 16 ) yields the Gaussian 
distribution!]] 



z- 1 j put)nHm)]-y)n 

\ J ${t)D\CCF)- x Dt{t)dt 



oc / exp 



with covariance matrix 
and mean 



5(H£(t)]-(y-HD- L b))VZ 



oc exp 



1 



(y - i?D- 1 b) t S" 1 (y - HD~ L b 



-li 



S = HD- 1 (CC T )(D^)- 1 H^ 
z = HD~ l h . 



(17) 



(18) 



(19) 

The covariance matrix is a square matrix of dimension (n + l)Nm', whose m! x m! blocks are 
given by the equations 



(20) 



with E as defined by (13) and (14). 

Finally, I determine the Gaussian distribution for the (n + l)th replica (the online system), 
conditioned on the simulations y", for i = 1, . . . , N and a = 1, . . . , n. Therefore, I split the 
(n + l)th replica off, writing £ as the block matrix 



j-m+l,n+l 




VJ.,n+l 





(21) 



1 To keep the notation simple, we will omit the dependence of H, D, C, and b on x subsequently. 
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Mean and co-variance matrix of the online system are given by equations 



£ = £ 



y = z n+1 + S n+1 ' a (S / )^ 1 (y 6 -z 6 ) 



ah 

n+l,n+l vnn+l,o/p/\-ly>(),ii+l 



(22) 
(23) 



where a and b run from 1 to n only. Note that eq. (22) is translational invariant. Thus, one 
may replace condition ^ by 

£(0)=£o. (24) 



In the remainder of this chapter, a recursive procedure of calculating (22) and (23) is 



developed. Due to path-ordering eqs. (13) and (20) can be written as 



^fc=o ' 



(25) 



with 



o/3 
9 k 



•fc+i 



"P exp 



•fe+i 



A(x Q (r))dr 



(C , C T )^(i')^exp 



^ T (x p (r))dr 



fe+i 



/if = V exp 



— -p eX p 



'2+1 



A(x Q (r))dr 



^ T (x a (r))dr 



The boundary conditions Q or (24) imply the initial variances 



^00 — U • 



(26) 
(27) 
(28) 

(29) 



In the conditioning step of the algorithm one calculates (£') 1 and z a , for a = 1, . . . ,n. For 
the former, use (14), (20) and (29) and, for j < i, the recursion relations 



(30) 
(31) 



For the latter, set 

and use the recursion relations 



zf +1 = ^z? + k?, z£ = 0. 



with 



U+i 



V exp 



A(x a (r))dr 



b a (t')dt' 



(32) 



(33) 



Once (£') and all the z a are calculated, pre-calculate, for the emulation step, the covectors 



z' ia :=2S(fly) T ((E')- 1 )2(yJ--4) 



(34) 
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where, on the r.h.s., the i's and the a's are not summed over and with 



rpa 

ij 



ht^.-.tf", j>i + 2, 



1. 
0, 



j = i + l, 
else . 



(35) 



In the actual emulation step calculate ( 22 ) setting 



ttTI+1 ~ 



and using the recursion relation 



n+l | n+l,a / 



(36) 



(37) 



with z' ia as defined in (34). In order to get the start value, yi, one needs to calculate Sjt ,a 
using (14), (29) and the recursion relations (30) and (31). The computational complexity of 
the emulation step is of the order 0{Nn) in matrix multiplications of dimension m. If one is 
interested in the variances, i.e., the diagonal elements of X, one may derive a similar recursion 
formula for them. 

Since path-ordered exponentials can, in general, not be calculated analytically, I consider 
the special case of piece-wise constant input 

x a (t) = xf , U < t < t i+l . 



Then, (|26j), (|27j) and (|33j) reduce to 

rtk+i 



an i J-.C 

9k = \ k 



,(t k+1 -t')A(x%) cc T e (t k+1 -t')(A(xP)) T ^ 



h a _ g (ti+i-*j)A(xf) ^ 

h+1 e^-^ A ^bfdt' 



K 



(38) 
(39) 
(40) 



If .A(x) is diagonalizable, functions (38) through (40) can be obtained analytically. For 



one gets 
with 

and 
and 



,4(x£) = M fc a dia go [A«] (Mg)- 1 



gf = (Rf) 2 M%Bf(Mef 
exp((t k+1 -t k )(Xl p + \l q ))-l 



'k ) <? 



\a i \P 
A k,p A k,q 



((M fc a )- 1 CC' T ((Mf)- 1 ) T f 



q ; 



^ = Mf diag D [exp((i m - *,)A£ )] (M?)" 1 , 
exp((tj + i 



k« = Mfdia go 



(M, 



(41) 
(42) 

(43) 
(44) 
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If x is time-independent (e.g. parameters of the model) and A(x) diagonalizable, (20) can 
be calculated explicitly. If 



,4(x a ) = M a diag D [A"] (M 



a\-l 



one derives from (20) that, for ti > tj, 



tff = (R? f, ) 2 M a B?f (M") T , 



where 

{B°f)% = ((M a )- 1 CCf r ((MP) T )- 1 y q T expfoA? + tj\% - t>{\« + X^dt* 

J to 



= ((M^C^dM^yy, V * ' P - . (45) 

A^ + Xq 

3 Hydrological Application 

In this section, the algorithm developed in the last section is tested with a simple hydrological 
model called logSPM [7j. The state vector of this model is three-dimensional, 

£ = (h s ,h gw ,h r ) T , (46) 

and describes the amount of water stored in the soil, the ground- water and the river. The 
dynamics is described by the system of ordinary differential equations 

"s — Qrain Qrunoff Qet Qlat Qgw j 

(47) 

h gw = Qgw — qbf — Qd P , (48) 

K = Qrunoff + Qlat + Qbf ~ Qr , (49) 

and visualized in Fig. [TJ The fluxes are given by the equations 



Irani — °rain\ 



i\t) j Qrunoff — fsatirain 

(t) , (50) 

Qet fet^pet (t) , qiat — fsatQlat,max > (51) 

Qgw — fsatQgw.max > Qbf — kf,fhg W , (52) 

Qdp = kdphg W , q r = k r h r , (53) 

with the fraction of saturated area, f sa t , given by equation 

fsat = z j _u u : , (54) 

1 + SFe Kalls 1 + sf 

and the fraction of actual evapotranspiration, f e t, given by equation 

U = 1 - e~ k ^ . (55) 
The output of the model is the river flow, Q r , given as 

Qr = Awq r , 
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where Aw is the area of watershed. 

The linearization of the model equations reads: 



with 

and 

with 

and 



A(x) 




%(t) 
a A 2 
c(t) b A 3 



(56) 
(57) 



x(i) = (Ai(t), A 2 , A 3 , a, b, c(t), i ra in{t)) T , 

O = 0'satQgw,max j & = j c(t) = dsatiirainify ~\~ Qlat,rnax) j 



(58) 



Al(£) — —a sa t(i r ain{t) + qiat,max + Qgw,max) ~ detipet{t) , A 2 — ~^bf ~ ^dp , ^3 — ~k r ■ (59) 



The functions (54) and (55) were approximated by linear functions that intersect the nonlinear 
functions at h Sj \ and h s> 2, respectively. See Fig. [2j Therefore, 



1 



1 



O-sat 



and 



1 

h SjX \l + s F e- k ^hs,i l + s F J ' 
1 



a e t 



h 



s,2 



1 



— k et h s 



(60) 



Only the inputs i ra in and i pe t are time-dependent, and, therefore, c, b and Ai. The observation 
matrices read as 

'0 

v -A w \% i 

I choose the Euclidean metric on the space of parameters 

^ = {k s , Sp, k e t, qiat,max: 1gw m ax> ^bfi kdp, k r ) T , 

where each component is normalized with a reasonable range. The noise C was chosen to be 
diagonal and the diagonal entries to be a certain fraction of the initial condition £o- 
Obviously, A(x) is diagonalizable: 



with 



M-\t)A( X )M(t) = diag [A ], 



M(t) 



1 




1 



Ai-A 2 

c(Ai-A 2 )+afr b I 

\(Ai-A 2 )(Ai-A 3 ) A 2 -A 3 1 / 



(61) 



(62) 



and the matrices (41), (43) and (44) can be calculated analytically. Plot [3] compares solutions 
of the full model with emulated solutions for 5 randomly chosen sets of parameters (that were 
not used for the conditioning of the emulator). The results are very similar to those obtained 
in [9]. For an extended statistical analysis of the performance of the emulator, I refer to [9]. 



S 



4 Conclusions 



I have presented an explicit solution for the emulation of the time-series of a dynamic model. 
In general, the path-ordered exponentials the solution is expressed with cannot be calculated 
analytically. Therefore, I resort to piece-wise constant inputs. Then, the emulator presented 
in this paper is the same as the one presented in [9] , except that I integrate the noise between 
time steps. For piece-wise constant input, this can be done at negligible additional cost and 
potentially increases the quality of the emulation. 

The exact solution presented in eqs. (22) and (23) allows for an efficient numerical imple- 
mentation that is of the order O(nN) in matrix multiplications of dimension m. The Kalman 
filtering and smoothing algorithm used in [9] needs O(N) matrix multiplications of dimension 
nm and matrix inversions of dimension m. 

The disadvantage of my method, however, as compared to the one presented in [9] is the 
fact that a huge matrix of dimension Nnm! needs to be inverted for the conditioning, which 
might be challenging both for the memory and the CPU. 
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Figure 1: Visualization of the fluxes in the model logSPM. Taken from J. Comp. Stat, and 
Data Analysis. 
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Figure 2: Shape of the nonlinear functions used for describing some fluxes in the hydrological 
model. Fraction of saturated are, left, and fraction of actual evapotranspiration, right. The 
bold parts of the curves represent the range of values covered in the base simulation. The 
straight lines represent linearizations that intersect the nonlinear function at given values of 
h s . Taken from J. Comp. Stat, and Data Analysis. 
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Figure 3: Comparison of simulations of the full model (left) with emulations (right) for 5 
randomly chosen sets of parameters. The dominant model input, rain intensity, is plotted 
from the top (right scale). The emulator was conditioned with 50 sets of parameters. The 
d-value is the square root of the mean sum of squares. 
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